Introduction

Many times there are multiple data sets that we’d like to download from a website. Sometimes we are fortunate enough that the data is stored in multiple csv files within a zipped folder or under a common link. But what if the data is not centrally stored in a single zipped file or is not in csv format? For this post we will extract data that is stored in multiple pdf files from the Florida Division of Elections website.

Pulling the data

Let’s utilize the html_nodes and html_attr functions from the rvest package to extract all links from the webpage then zoom in on only the pdf documents.

library(tidyverse)
library(tabulizer)
library(tabulizerjars)
library(knitr)
library(kableExtra)
library(rvest)
library(reshape)
library(scales)


url <- "http://dos.myflorida.com/elections/data-statistics/elections-data/general-election-summaries/"

#extract link names from page
links <- read_html(url) %>%
  html_nodes("a") %>% 
  html_attr("href")

#detect pdf files we need
pdf_links <- links[str_detect(links, ".pdf")]
pdf_links
##  [1] "/media/697842/2016-ge-summaries-ballots-by-type-activity.pdf"   
##  [2] "/media/697841/2016-ge-summaries-provisional-voting.pdf"         
##  [3] "/media/697839/2016-ge-summaries-poll-workers.pdf"               
##  [4] "/media/697840/2016-ge-summaries-precincts-and-polling-sites.pdf"
##  [5] "/media/694976/2014ballotscast.pdf"                              
##  [6] "/media/694977/2014provisionalballots.pdf"                       
##  [7] "/media/694978/2014pollworkers.pdf"                              
##  [8] "/media/694975/2014pollingsites.pdf"                             
##  [9] "/media/693340/2012ballotscast.pdf"                              
## [10] "/media/693345/2012provisionalballots.pdf"                       
## [11] "/media/693338/2012pollworkers.pdf"                              
## [12] "/media/693339/2012pollingsites.pdf"                             
## [13] "/media/693353/2010ballotscast.pdf"                              
## [14] "/media/693336/2010provisionalballots.pdf"                       
## [15] "/media/693337/2010pollworkers.pdf"                              
## [16] "/media/693355/2010pollingsites.pdf"                             
## [17] "/media/693351/2008ballotscast.pdf"                              
## [18] "/media/693354/2008provisionalballots.pdf"                       
## [19] "/media/693352/2008pollworkers.pdf"                              
## [20] "/media/693349/2008pollingsites.pdf"                             
## [21] "/media/693346/2006ballotscast.pdf"                              
## [22] "/media/693348/2006provisionalballots.pdf"                       
## [23] "/media/693350/2006pollworkers.pdf"                              
## [24] "/media/693347/2006pollingsites.pdf"                             
## [25] "/media/693341/2004ballotscast.pdf"                              
## [26] "/media/693836/2004provisionalballots.pdf"                       
## [27] "/media/693343/2004pollworkerinfo.pdf"                           
## [28] "/media/693342/2004pollingsites.pdf"

Based on the output we see that there are 28 unique links with the .pdf extension. Our main focus is the voter turnout data for the 2004 to 2016 general elections. This data is contained in the ballotscast.pdf for 2004 to 2014 and in summaries-ballots-by-type-activity.pdf for 2016. Now that we know which links contain the relevant data we can extract them using str_detect. Let’s use ballotscast and type as keywords for identifying the necessary pdf files.

links_data <- pdf_links[str_detect(pdf_links,"ballotscast|type")]
links_data
## [1] "/media/697842/2016-ge-summaries-ballots-by-type-activity.pdf"
## [2] "/media/694976/2014ballotscast.pdf"                           
## [3] "/media/693340/2012ballotscast.pdf"                           
## [4] "/media/693353/2010ballotscast.pdf"                           
## [5] "/media/693351/2008ballotscast.pdf"                           
## [6] "/media/693346/2006ballotscast.pdf"                           
## [7] "/media/693341/2004ballotscast.pdf"

Great! We are now left with seven links. Upon looking at the pdf files, directly from the website, we notice that the files prior to 2012 are formatted differently than the files for 2014 and 2016. Additionally, the 2012 file is formatted differently than all the other files. This makes it difficult to import all the files in one pull so we will have to import and tidy the data in three sections. The first section will focus on the 2014 and 2016 files, the second on the 2012 file and the third on the 2004 to 2010 files. We’ll extract the data from each file, convert each file to a data frame, tidy it and finally combine all the data frames into one. Now who’s ready to have some fun?!

Extracting all Years (excluding 2012)

The first step will be to read in relevant files corresponding to this time frame. This can be achieved in three steps:

  1. Separate the 2012 link from all other links
  2. Create a read_url function to extract the data from the pdf files, turn it into a list of data frames then append the year to each data frame
  3. Paste the .pdf portion of each link to the base URL and use read_url to save each file as a data frame
#exclude 2012 link as this file is formatted differently than others
all_links <- links_data[-3]

#only 2012 link
link_2012 <- links_data[3]

#create read_url function #
read_url <- function(url){
  df <- extract_tables(url) 
  df <- do.call(rbind, df) %>%  
    data.frame()   #create a dataframe for each file within the list
  df <- cbind(df,'year' = substr(url,39,42)) #get year for each df
}

#apply function to each link
for(i in seq_along(all_links)){
  df <- read_url(paste0("http://dos.myflorida.com",all_links[i]))
  assign(paste0("df", i),df)
}

Executing read_url within the for loop created six data frames df1, df2,…, df6 corresponding to the years 2004 to 2016 but excluding 2012. Thus, df1 represents 2016 data, df2 represents 2014 data, df3 represents 2010 data and so on.

Inspect Data

Lets look at the rows of df1 to determine its structure.

kable(df1,row.names = FALSE) %>% 
   kable_styling() %>%
   scroll_box(width = "100%", height = "400px")
. year
FLORIDA DEPARTMENT OF STATE 2016
DIVISION OF ELECTIONS 2016
Voting Activity by Ballot Type for 2016 General Election 2016
Source: Responses by the County Supervisors of Elections to the U.S. Election Assistance Commission’s 2016 Election 2016
Administration and Voting Survey. (Columns F1b, F1f, F1c+F1d , E1a of the survey) . Data reflects county‐specific number of 2016
persons who voted and attempted to vote by one of the four categories below. For official election results of ballots cast 2016
and counted for eligible/registered voters, refer to: http://doe.dos.state.fl.us/elections/resultsarchive/index.asp 2016
Vote‐by‐Mail 2016
County Physical Poll ‐Election Day Early Vote Center Provisional Voting back‐up (Federal Write‐domestic/overseas and 2016
In Absentee Ballot) 2016
Alachua 50,491 51,097 29,040 993 2016
Baker 3,584 7,180 2,003 1 2016
Bay 25,579 43,963 18,952 302 2016
Bradford 5,095 4,675 2,450 17 2016
Brevard 109,623 118,427 92,708 370 2016
Broward 213,112 426,592 202,904 1,623 2016
Calhoun 2,448 2,692 1,053 21 2016
Charlotte 23,477 39,644 35,019 71 2016
Citrus 26,556 29,323 25,105 157 2016
Clay 33,064 54,533 20,165 101 2016
Collier 42,681 74,932 57,159 688 2016
Columbia 9,132 14,044 5,884 27 2016
DeSoto 4,274 4,452 2,216 57 2016
Dixie 3,011 2,319 1,935 19 2016
Duval 136,287 227,475 76,053 1,108 2016
Escambia 59,696 56,213 39,791 371 2016
Flagler 15,191 29,696 13,537 210 2016
Franklin 2,264 2,314 1,533 15 2016
Gadsden 8,359 9,627 4,317 33 2016
Gilchrist 4,233 2,412 1,865 6 2016
Glades 2,720 863 831 0 2016
Gulf 2,293 3,612 1,456 19 2016
Hamilton 2,430 1,702 1,392 11 2016
Hardee 3,241 3,618 829 16 2016
Hendry 3,597 5,810 1,861 9 2016
Hernando 35,004 25,911 34,558 54 2016
Highlands 16,149 18,996 11,197 139 2016
Hillsborough 173,637 242,184 191,201 2,879 2016
Holmes 3,669 2,901 2,044 10 2016
Indian River 25,359 33,136 23,196 227 2016
Jackson 7,970 9,116 4,179 46 2016
Jefferson 3,553 2,883 1,317 7 2016
Lafayette 1,653 1,147 624 16 2016
Lake 65,105 75,197 32,672 643 2016
FLORIDA DEPARTMENT OF STATE 2016
DIVISION OF ELECTIONS 2016
Voting Activity by Ballot Type for 2016 General Election 2016
Source: Responses by the County Supervisors of Elections to the U.S. Election Assistance Commission’s 2016 Election 2016
Administration and Voting Survey. (Columns F1b, F1f, F1c+F1d , E1a of the survey) . Data reflects county‐specific number of 2016
persons who voted and attempted to vote by one of the four categories below. For official election results of ballots cast 2016
and counted for eligible/registered voters, refer to: http://doe.dos.state.fl.us/elections/resultsarchive/index.asp 2016
Vote‐by‐Mail 2016
County Physical Poll ‐Early Vote Center domestic/overseas and Election Day Provisional Voting back‐up (Federal Write‐ 2016
In Absentee Ballot) 2016
Lee 74,735 100,893 158,302 730 2016
Leon 55,731 67,518 31,792 365 2016
Levy 9,287 5,108 5,182 84 2016
Liberty 1,497 1,347 508 1 2016
Madison 2,792 4,554 1,261 80 2016
Manatee 64,556 48,604 68,494 524 2016
Marion 72,776 66,191 38,146 285 2016
Martin 23,118 34,038 30,375 83 2016
Miami‐Dade 217,803 476,001 311,597 2,973 2016
Monroe 15,597 13,325 14,398 68 2016
Nassau 13,480 23,515 10,294 110 2016
Okaloosa 34,348 45,186 23,318 355 2016
Okeechobee 5,393 5,879 2,596 24 2016
Orange 141,015 255,113 169,031 714 2016
Osceola 34,044 63,254 45,228 284 2016
Palm Beach 278,030 241,376 152,226 2,330 2016
Pasco 100,110 77,027 68,565 343 2016
Pinellas 168,563 78,177 255,374 754 2016
Polk 116,350 92,598 80,091 1,134 2016
Putnam 12,996 13,856 6,621 74 2016
Santa Rosa 32,823 37,197 19,143 334 2016
Sarasota 75,810 79,817 77,811 612 2016
Seminole 56,794 109,262 63,291 629 2016
St. Johns 39,611 71,617 27,329 228 2016
St. Lucie 44,579 57,289 41,389 135 2016
Sumter 12,517 45,545 19,812 13 2016
Suwannee 8,006 6,490 4,374 46 2016
Taylor 4,011 3,099 2,257 38 2016
Union 2,412 2,317 1,025 9 2016
Volusia 91,647 97,936 74,939 680 2016
Wakulla 5,548 6,824 3,152 8 2016
Walton 13,519 13,439 7,141 137 2016
Washington 5,050 3,675 2,509 10 2016
TOTAL 2,959,085 3,876,753 2,758,617 24,460 2016

 

We see that the data frame contains two columns. It appears that due to the format of the pdf file all the columns from the file were compressed into the first column of the data frame along with the page headings. If we were to scroll to the end of the data frame we’d also notice that a summary row was included for each of the ballot types. As the data was not imported as expected we will have to do some extra work to get it into tidy form. For now we’ll focus on df1 and df2 as they are formatted identically. We’ll deal with df3 to df6 later on.

Post 2012 Data

Preparing this data frame will happen in 2 main steps:

  1. Combine df1 and df2 into df_12 and remove rows with non valid data in column one
  2. Separate column one into four columns

Step 1

We’ll first combine df1 and df2 then identify the rows to remove by using non-county keywords

#combine df1 and df2
df_12 <- rbind(df1,df2)

#rename the columns and convert to lower case
names(df_12) <- c("values","year")
df_12$values <- tolower(df_12$values)

#detect non-county strings
to_exclude <- which(str_detect(df_12$values, "vote|division|total|florida|
                                             source|administration|persons|county|
                                             voting|absentee|election"))
#exclude rows that do not contain county data
df_12 <- df_12[-to_exclude, ] 

head(df_12)
##                                   values year
## 11      alachua 50,491 51,097 29,040 993 2016
## 12             baker 3,584 7,180 2,003 1 2016
## 13          bay 25,579 43,963 18,952 302 2016
## 14         bradford 5,095 4,675 2,450 17 2016
## 15    brevard 109,623 118,427 92,708 370 2016
## 16 broward 213,112 426,592 202,904 1,623 2016

Looking at the above output we see that the values column contains the county followed by the number of ballots for each vote type. Thus, we can create a county column by extracting only the words from values. Additionally, we’ll create a totals column by removing the commas and extracting the digits portion of the values column.

Step 2

for (i in 1:nrow(df_12)){
  df_12$county[i] <- str_extract_all(df_12$values[i], "[[:alpha:]]+") %>% #grab all words
    unlist() %>% 
    paste(collapse = " ")
  
  df_12$totals[i] <- str_replace_all(df_12$values[i], '[[:punct:]]', "") %>% #remove commas
    str_extract_all("[[:digit:]]+") %>% #grab all digits
    unlist() %>% 
    paste(collapse = " ")
}

kable(df_12,row.names=FALSE) %>% 
   kable_styling() %>%
   scroll_box(width = "100%", height = "400px")
values year county totals
alachua 50,491 51,097 29,040 993 2016 alachua 50491 51097 29040 993
baker 3,584 7,180 2,003 1 2016 baker 3584 7180 2003 1
bay 25,579 43,963 18,952 302 2016 bay 25579 43963 18952 302
bradford 5,095 4,675 2,450 17 2016 bradford 5095 4675 2450 17
brevard 109,623 118,427 92,708 370 2016 brevard 109623 118427 92708 370
broward 213,112 426,592 202,904 1,623 2016 broward 213112 426592 202904 1623
calhoun 2,448 2,692 1,053 21 2016 calhoun 2448 2692 1053 21
charlotte 23,477 39,644 35,019 71 2016 charlotte 23477 39644 35019 71
citrus 26,556 29,323 25,105 157 2016 citrus 26556 29323 25105 157
clay 33,064 54,533 20,165 101 2016 clay 33064 54533 20165 101
collier 42,681 74,932 57,159 688 2016 collier 42681 74932 57159 688
columbia 9,132 14,044 5,884 27 2016 columbia 9132 14044 5884 27
desoto 4,274 4,452 2,216 57 2016 desoto 4274 4452 2216 57
dixie 3,011 2,319 1,935 19 2016 dixie 3011 2319 1935 19
duval 136,287 227,475 76,053 1,108 2016 duval 136287 227475 76053 1108
escambia 59,696 56,213 39,791 371 2016 escambia 59696 56213 39791 371
flagler 15,191 29,696 13,537 210 2016 flagler 15191 29696 13537 210
franklin 2,264 2,314 1,533 15 2016 franklin 2264 2314 1533 15
gadsden 8,359 9,627 4,317 33 2016 gadsden 8359 9627 4317 33
gilchrist 4,233 2,412 1,865 6 2016 gilchrist 4233 2412 1865 6
glades 2,720 863 831 0 2016 glades 2720 863 831 0
gulf 2,293 3,612 1,456 19 2016 gulf 2293 3612 1456 19
hamilton 2,430 1,702 1,392 11 2016 hamilton 2430 1702 1392 11
hardee 3,241 3,618 829 16 2016 hardee 3241 3618 829 16
hendry 3,597 5,810 1,861 9 2016 hendry 3597 5810 1861 9
hernando 35,004 25,911 34,558 54 2016 hernando 35004 25911 34558 54
highlands 16,149 18,996 11,197 139 2016 highlands 16149 18996 11197 139
hillsborough 173,637 242,184 191,201 2,879 2016 hillsborough 173637 242184 191201 2879
holmes 3,669 2,901 2,044 10 2016 holmes 3669 2901 2044 10
indian river 25,359 33,136 23,196 227 2016 indian river 25359 33136 23196 227
jackson 7,970 9,116 4,179 46 2016 jackson 7970 9116 4179 46
jefferson 3,553 2,883 1,317 7 2016 jefferson 3553 2883 1317 7
lafayette 1,653 1,147 624 16 2016 lafayette 1653 1147 624 16
lake 65,105 75,197 32,672 643 2016 lake 65105 75197 32672 643
lee 74,735 100,893 158,302 730 2016 lee 74735 100893 158302 730
leon 55,731 67,518 31,792 365 2016 leon 55731 67518 31792 365
levy 9,287 5,108 5,182 84 2016 levy 9287 5108 5182 84
liberty 1,497 1,347 508 1 2016 liberty 1497 1347 508 1
madison 2,792 4,554 1,261 80 2016 madison 2792 4554 1261 80
manatee 64,556 48,604 68,494 524 2016 manatee 64556 48604 68494 524
marion 72,776 66,191 38,146 285 2016 marion 72776 66191 38146 285
martin 23,118 34,038 30,375 83 2016 martin 23118 34038 30375 83
miami‐dade 217,803 476,001 311,597 2,973 2016 miamiâ dade 217803 476001 311597 2973
monroe 15,597 13,325 14,398 68 2016 monroe 15597 13325 14398 68
nassau 13,480 23,515 10,294 110 2016 nassau 13480 23515 10294 110
okaloosa 34,348 45,186 23,318 355 2016 okaloosa 34348 45186 23318 355
okeechobee 5,393 5,879 2,596 24 2016 okeechobee 5393 5879 2596 24
orange 141,015 255,113 169,031 714 2016 orange 141015 255113 169031 714
osceola 34,044 63,254 45,228 284 2016 osceola 34044 63254 45228 284
palm beach 278,030 241,376 152,226 2,330 2016 palm beach 278030 241376 152226 2330
pasco 100,110 77,027 68,565 343 2016 pasco 100110 77027 68565 343
pinellas 168,563 78,177 255,374 754 2016 pinellas 168563 78177 255374 754
polk 116,350 92,598 80,091 1,134 2016 polk 116350 92598 80091 1134
putnam 12,996 13,856 6,621 74 2016 putnam 12996 13856 6621 74
santa rosa 32,823 37,197 19,143 334 2016 santa rosa 32823 37197 19143 334
sarasota 75,810 79,817 77,811 612 2016 sarasota 75810 79817 77811 612
seminole 56,794 109,262 63,291 629 2016 seminole 56794 109262 63291 629
st. johns 39,611 71,617 27,329 228 2016 st johns 39611 71617 27329 228
st. lucie 44,579 57,289 41,389 135 2016 st lucie 44579 57289 41389 135
sumter 12,517 45,545 19,812 13 2016 sumter 12517 45545 19812 13
suwannee 8,006 6,490 4,374 46 2016 suwannee 8006 6490 4374 46
taylor 4,011 3,099 2,257 38 2016 taylor 4011 3099 2257 38
union 2,412 2,317 1,025 9 2016 union 2412 2317 1025 9
volusia 91,647 97,936 74,939 680 2016 volusia 91647 97936 74939 680
wakulla 5,548 6,824 3,152 8 2016 wakulla 5548 6824 3152 8
walton 13,519 13,439 7,141 137 2016 walton 13519 13439 7141 137
washington 5,050 3,675 2,509 10 2016 washington 5050 3675 2509 10
alachua 41,973 17,983 19,091 554 2014 alachua 41973 17983 19091 554
baker 4,350 2,621 1,653 0 2014 baker 4350 2621 1653 0
bay 26,898 17,278 14,225 185 2014 bay 26898 17278 14225 185
bradford 4,997 2,062 1,640 8 2014 bradford 4997 2062 1640 8
brevard 118,826 33,327 73,765 403 2014 brevard 118826 33327 73765 403
broward 223,095 129,248 123,820 311 2014 broward 223095 129248 123820 311
calhoun 2,297 1,258 701 10 2014 calhoun 2297 1258 701 10
charlotte 23,534 16,463 28,194 38 2014 charlotte 23534 16463 28194 38
citrus 25,484 13,800 19,777 102 2014 citrus 25484 13800 19777 102
clay 36,089 19,082 13,699 36 2014 clay 36089 19082 13699 36
collier 42,921 31,768 40,603 180 2014 collier 42921 31768 40603 180
columbia 8,658 6,161 3,828 10 2014 columbia 8658 6161 3828 10
desoto 3,844 2,158 1,622 5 2014 desoto 3844 2158 1622 5
dixie 2,827 1,196 1,489 7 2014 dixie 2827 1196 1489 7
duval 143,441 73,604 55,759 1,434 2014 duval 143441 73604 55759 1434
escambia 53,510 18,421 27,453 85 2014 escambia 53510 18421 27453 85
flagler 15,407 13,984 9,091 56 2014 flagler 15407 13984 9091 56
franklin 2,253 998 1,128 11 2014 franklin 2253 998 1128 11
gadsden 9,099 5,162 3,533 6 2014 gadsden 9099 5162 3533 6
gilchrist 3,624 1,035 1,405 4 2014 gilchrist 3624 1035 1405 4
glades 2,057 245 775 0 2014 glades 2057 245 775 0
gulf 2,288 1,830 1,090 13 2014 gulf 2288 1830 1090 13
hamilton 2,057 662 1,049 2 2014 hamilton 2057 662 1049 2
hardee 1,855 1,842 789 5 2014 hardee 1855 1842 789 5
hendry 3,100 2,281 1,400 7 2014 hendry 3100 2281 1400 7
hernando 31,957 9,781 23,154 58 2014 hernando 31957 9781 23154 58
highlands 15,466 9,126 7,900 115 2014 highlands 15466 9126 7900 115
hillsborough 156,529 88,379 131,659 1,923 2014 hillsborough 156529 88379 131659 1923
holmes 3,039 1,256 1,512 12 2014 holmes 3039 1256 1512 12
indian river 24,047 14,749 14,783 115 2014 indian river 24047 14749 14783 115
jackson 7,349 4,354 3,644 24 2014 jackson 7349 4354 3644 24
jefferson 3,550 1,587 1,192 27 2014 jefferson 3550 1587 1192 27
lafayette 1,500 693 480 3 2014 lafayette 1500 693 480 3
lake 60,379 28,798 24,980 327 2014 lake 60379 28798 24980 327
lee 70,887 38,077 104,345 761 2014 lee 70887 38077 104345 761
leon 58,289 27,885 22,939 295 2014 leon 58289 27885 22939 295
levy 7,222 1,993 4,315 67 2014 levy 7222 1993 4315 67
liberty 1,514 815 390 4 2014 liberty 1514 815 390 4
madison 3,250 2,122 1,040 59 2014 madison 3250 2122 1040 59
manatee 62,249 13,898 45,344 103 2014 manatee 62249 13898 45344 103
marion 67,987 20,715 33,109 48 2014 marion 67987 20715 33109 48
martin 23,313 14,615 24,621 41 2014 martin 23313 14615 24621 41
miami‐dade 227,275 116,936 189,324 875 2014 miamiâ dade 227275 116936 189324 875
monroe 13,633 5,665 9,818 35 2014 monroe 13633 5665 9818 35
nassau 14,388 8,679 7,888 42 2014 nassau 14388 8679 7888 42
okaloosa 33,757 14,877 13,409 84 2014 okaloosa 33757 14877 13409 84
okeechobee 4,665 2,228 2,147 19 2014 okeechobee 4665 2228 2147 19
orange 130,688 71,962 110,637 379 2014 orange 130688 71962 110637 379
osceola 31,266 15,866 22,544 152 2014 osceola 31266 15866 22544 152
palm beach 241,002 93,593 86,952 1,563 2014 palm beach 241002 93593 86952 1563
pasco 83,556 32,455 46,912 137 2014 pasco 83556 32455 46912 137
pinellas 133,828 18,987 204,706 292 2014 pinellas 133828 18987 204706 292
polk 104,849 29,473 60,309 301 2014 polk 104849 29473 60309 301
putnam 12,596 5,422 5,041 21 2014 putnam 12596 5422 5041 21
santa rosa 30,964 11,982 10,785 110 2014 santa rosa 30964 11982 10785 110
sarasota 77,364 30,903 55,160 285 2014 sarasota 77364 30903 55160 285
seminole 66,202 36,368 45,222 229 2014 seminole 66202 36368 45222 229
st. johns 44,479 26,845 16,572 70 2014 st johns 44479 26845 16572 70
st. lucie 41,534 19,632 29,399 111 2014 st lucie 41534 19632 29399 111
sumter 16,260 25,471 13,941 19 2014 sumter 16260 25471 13941 19
suwannee 6,947 2,424 3,498 35 2014 suwannee 6947 2424 3498 35
taylor 3,738 1,206 1,826 23 2014 taylor 3738 1206 1826 23
union 2,396 1,713 1,055 5 2014 union 2396 1713 1055 5
volusia 88,743 33,259 56,719 281 2014 volusia 88743 33259 56719 281
wakulla 5,911 3,176 2,599 6 2014 wakulla 5911 3176 2599 6
walton 11,495 4,763 4,311 56 2014 walton 11495 4763 4311 56
washington 4,454 1,846 2,065 9 2014 washington 4454 1846 2065 9

 

It appears that our process has worked. There is a column for county and also a column for totals. Let’s make sure that the counties are correctly spelled before moving forward.

unique(df_12$county)
##  [1] "alachua"      "baker"        "bay"          "bradford"    
##  [5] "brevard"      "broward"      "calhoun"      "charlotte"   
##  [9] "citrus"       "clay"         "collier"      "columbia"    
## [13] "desoto"       "dixie"        "duval"        "escambia"    
## [17] "flagler"      "franklin"     "gadsden"      "gilchrist"   
## [21] "glades"       "gulf"         "hamilton"     "hardee"      
## [25] "hendry"       "hernando"     "highlands"    "hillsborough"
## [29] "holmes"       "indian river" "jackson"      "jefferson"   
## [33] "lafayette"    "lake"         "lee"          "leon"        
## [37] "levy"         "liberty"      "madison"      "manatee"     
## [41] "marion"       "martin"       "miamiâ dade"  "monroe"      
## [45] "nassau"       "okaloosa"     "okeechobee"   "orange"      
## [49] "osceola"      "palm beach"   "pasco"        "pinellas"    
## [53] "polk"         "putnam"       "santa rosa"   "sarasota"    
## [57] "seminole"     "st johns"     "st lucie"     "sumter"      
## [61] "suwannee"     "taylor"       "union"        "volusia"     
## [65] "wakulla"      "walton"       "washington"

Every county looks fine except for Miami-Dade. We’ll rename this county then separate the totals column into election_day, early_voting and absentee columns.

df_12 <- df_12 %>% 
  mutate(county = gsub("miami? dade", "miami-dade", county)) %>% #correct miami-dade spelling
  separate(totals, c("election_day", "early_voting", "absentee", "provisional")) %>% #separate columns
  select(-c(values, provisional)) #remove unnecessary columns

kable(df_12) %>% 
   kable_styling() %>%
   scroll_box(width = "100%", height = "400px")
year county election_day early_voting absentee
2016 alachua 50491 51097 29040
2016 baker 3584 7180 2003
2016 bay 25579 43963 18952
2016 bradford 5095 4675 2450
2016 brevard 109623 118427 92708
2016 broward 213112 426592 202904
2016 calhoun 2448 2692 1053
2016 charlotte 23477 39644 35019
2016 citrus 26556 29323 25105
2016 clay 33064 54533 20165
2016 collier 42681 74932 57159
2016 columbia 9132 14044 5884
2016 desoto 4274 4452 2216
2016 dixie 3011 2319 1935
2016 duval 136287 227475 76053
2016 escambia 59696 56213 39791
2016 flagler 15191 29696 13537
2016 franklin 2264 2314 1533
2016 gadsden 8359 9627 4317
2016 gilchrist 4233 2412 1865
2016 glades 2720 863 831
2016 gulf 2293 3612 1456
2016 hamilton 2430 1702 1392
2016 hardee 3241 3618 829
2016 hendry 3597 5810 1861
2016 hernando 35004 25911 34558
2016 highlands 16149 18996 11197
2016 hillsborough 173637 242184 191201
2016 holmes 3669 2901 2044
2016 indian river 25359 33136 23196
2016 jackson 7970 9116 4179
2016 jefferson 3553 2883 1317
2016 lafayette 1653 1147 624
2016 lake 65105 75197 32672
2016 lee 74735 100893 158302
2016 leon 55731 67518 31792
2016 levy 9287 5108 5182
2016 liberty 1497 1347 508
2016 madison 2792 4554 1261
2016 manatee 64556 48604 68494
2016 marion 72776 66191 38146
2016 martin 23118 34038 30375
2016 miamiâ dade 217803 476001 311597
2016 monroe 15597 13325 14398
2016 nassau 13480 23515 10294
2016 okaloosa 34348 45186 23318
2016 okeechobee 5393 5879 2596
2016 orange 141015 255113 169031
2016 osceola 34044 63254 45228
2016 palm beach 278030 241376 152226
2016 pasco 100110 77027 68565
2016 pinellas 168563 78177 255374
2016 polk 116350 92598 80091
2016 putnam 12996 13856 6621
2016 santa rosa 32823 37197 19143
2016 sarasota 75810 79817 77811
2016 seminole 56794 109262 63291
2016 st johns 39611 71617 27329
2016 st lucie 44579 57289 41389
2016 sumter 12517 45545 19812
2016 suwannee 8006 6490 4374
2016 taylor 4011 3099 2257
2016 union 2412 2317 1025
2016 volusia 91647 97936 74939
2016 wakulla 5548 6824 3152
2016 walton 13519 13439 7141
2016 washington 5050 3675 2509
2014 alachua 41973 17983 19091
2014 baker 4350 2621 1653
2014 bay 26898 17278 14225
2014 bradford 4997 2062 1640
2014 brevard 118826 33327 73765
2014 broward 223095 129248 123820
2014 calhoun 2297 1258 701
2014 charlotte 23534 16463 28194
2014 citrus 25484 13800 19777
2014 clay 36089 19082 13699
2014 collier 42921 31768 40603
2014 columbia 8658 6161 3828
2014 desoto 3844 2158 1622
2014 dixie 2827 1196 1489
2014 duval 143441 73604 55759
2014 escambia 53510 18421 27453
2014 flagler 15407 13984 9091
2014 franklin 2253 998 1128
2014 gadsden 9099 5162 3533
2014 gilchrist 3624 1035 1405
2014 glades 2057 245 775
2014 gulf 2288 1830 1090
2014 hamilton 2057 662 1049
2014 hardee 1855 1842 789
2014 hendry 3100 2281 1400
2014 hernando 31957 9781 23154
2014 highlands 15466 9126 7900
2014 hillsborough 156529 88379 131659
2014 holmes 3039 1256 1512
2014 indian river 24047 14749 14783
2014 jackson 7349 4354 3644
2014 jefferson 3550 1587 1192
2014 lafayette 1500 693 480
2014 lake 60379 28798 24980
2014 lee 70887 38077 104345
2014 leon 58289 27885 22939
2014 levy 7222 1993 4315
2014 liberty 1514 815 390
2014 madison 3250 2122 1040
2014 manatee 62249 13898 45344
2014 marion 67987 20715 33109
2014 martin 23313 14615 24621
2014 miamiâ dade 227275 116936 189324
2014 monroe 13633 5665 9818
2014 nassau 14388 8679 7888
2014 okaloosa 33757 14877 13409
2014 okeechobee 4665 2228 2147
2014 orange 130688 71962 110637
2014 osceola 31266 15866 22544
2014 palm beach 241002 93593 86952
2014 pasco 83556 32455 46912
2014 pinellas 133828 18987 204706
2014 polk 104849 29473 60309
2014 putnam 12596 5422 5041
2014 santa rosa 30964 11982 10785
2014 sarasota 77364 30903 55160
2014 seminole 66202 36368 45222
2014 st johns 44479 26845 16572
2014 st lucie 41534 19632 29399
2014 sumter 16260 25471 13941
2014 suwannee 6947 2424 3498
2014 taylor 3738 1206 1826
2014 union 2396 1713 1055
2014 volusia 88743 33259 56719
2014 wakulla 5911 3176 2599
2014 walton 11495 4763 4311
2014 washington 4454 1846 2065

 

One final check of the data reveals everything is correct! Now let’s proceed to the 2012 data.

2012 Data

As data for 2012 was not extracted with the other data we will first have to extract it.

df7 <- extract_tables(paste0("http://dos.myflorida.com", link_2012))
glimpse(df7)
## List of 2
##  $ : chr [1:42, 1] "FLORIDA DEPARTMENT OF STATE" "Division of Elections" "Ballots-by-Type Activity for 2012 General Election" "" ...
##  $ : chr [1:38, 1:5] "Jackson" "Jefferson" "Lafayette" "Lake" ...

It appears that df7 was extracted as a list containing two elements.The first element contains contains 42 rows and 1 column which is the same structure of the values column from df_12. The second element of the list contains 38 rows and 5 columns. Note that both elements of the list contain data for 2012 but the data was split into two different parts due to the format of the pdf file, i.e. the first page contained headers and a link while the second page only contained a table of data. We’ll convert each element to a data frame then combine the two.

Let’s have a look at the data to better understand it. The left portion represents the data from the first list element while the right portion represents the second. We see how the headings on the first page of the pdf have completely messed with the structure of the first table. In fact, the data in the second list element is practically in tidy format, It just needs to be converted to a data frame and given column headings.

kable(df7) %>% 
   kable_styling() %>%
   scroll_box(width = "100%", height = "400px")
FLORIDA DEPARTMENT OF STATE
Division of Elections
Ballots-by-Type Activity for 2012 General Election
*Source: US Election Assistance Commission- 2012 Election Administration and Voting Survey Results.
Data reflects voting activity for all persons who voted and attempted to vote, registered or not. For official election results of
ballots cast and counted for eligible/registered voters, refer to: http://doe.dos.state.fl.us/elections/resultsarchive/index.asp
Physical Poll
Voted Using
Voters on Election Voted Using Early Domestic or Voted Using
County Day Vote Center UOCAVA Absentee Provisional Ballot
(F1b)* (F1f)* Ballot or FWAB (F1c + F1d)* (E1a)*
Alachua 54204 39271 27393 1181
Baker 4101 3601 1880 0
Bay 29255 33857 17797 460
Bradford 5869 3958 1887 10
Brevard 144657 51449 91170 929
Broward 341637 245952 171785 2971
Calhoun 2822 2273 1112 58
Charlotte 24979 26688 33653 69
Citrus 26650 24686 23031 254
Clay 36171 41692 18842 423
Collier 49089 52290 48646 580
Columbia 9271 13177 4829 49
DeSoto 4694 3138 2107 25
Dixie 3776 1262 1959 11
Duval 157365 174420 82913 6803
Escambia 67630 42861 39647 511
Flagler 20070 19906 10651 209
Franklin 2618 1470 1437 19
Gadsden 8895 9598 4123 100
Gilchrist 4249 1935 1789 12
Glades 2541 498 943 44
Gulf 2591 3046 1517 23
Hamilton 2659 1442 1355 20
Hardee 3346 2846 1083 7
Hendry 4537 3846 1933 23
Hernando 40030 12752 27806 240
Highlands 16103 16991 9806 290
Hillsborough 205283 166990 171666 5074
Holmes 4091 2391 1892 30
Indian River 23841 28646 19630 329
Jackson 8913 7777 4371 126
Jefferson 3904 2481 1454 18
Lafayette 1892 962 589 0
Lake 63751 57612 30793 274
Lee 125346 51932 89638 1347
Leon 72848 45110 30316 1136
Levy 9816 3653 5108 81
Liberty 1617 1210 496 6
Madison 3608 3669 1411 67
Manatee 91799 16357 45880 545
Marion 80570 43209 39052 138
Martin 21739 26335 31425 224
Miami-Dade 406841 235916 247285 2828
Monroe 14491 11106 13778 102
Nassau 11658 18339 10772 166
Okaloosa 40005 32847 22496 282
Okeechobee 5895 3525 2975 36
Orange 205018 127583 137125 3096
Osceola 41532 41736 26484 519
Palm Beach 351600 124833 130532 2056
Pasco 94321 60974 59648 719
Pinellas 172605 39569 250113 1483
Polk 128094 55120 66392 1536
Putnam 14169 10622 6743 152
Santa Rosa 37931 22639 16648 279
Sarasota 78328 55473 73876 946
Seminole 89594 66721 53189 1700
St. Johns 39631 52565 23515 336
St. Lucie 48960 37353 37041 698
Sumter 16559 30269 14028 38
Suwannee 8941 4593 4276 66
Taylor 4490 2518 2148 62
Union 2645 1662 1139 7
Volusia 110545 61097 63821 648
Wakulla 5940 5431 3321 56
Walton 13206 10088 5319 127
Washington 5120 3279 2636 91
TOTAL 3,736,946 2,409,097 2,380,115 42,745

 

#extract first list element
first_table <- df7[[1]]
first_table <- first_table[13:42, ] %>% data.frame() #get necesssary rows and convert to df
names(first_table) <- 'col1'

#tidy the data
first_table <- first_table %>% 
  #combine Indian River to make separating on space easy
  mutate(col1 = str_replace_all(col1,"Indian River", "IndianRiver")) %>% 
  separate(col1, c("county", "election_day","early_voting", "absentee","provisional")) %>% 
  mutate(county = str_replace_all(county,"IndianRiver", "Indian River")) 

#extract second list element
second_table <- data.frame(df7[[2]])
names(second_table) <- c("county", "election_day","early_voting", "absentee","provisional")

#combine the two tables
df7 <- rbind(first_table, second_table)
df7$year <- '2012' #create year column
df7$provisional <- NULL #drop provisional column
df7 <- df7[-68, ] #remove row 68 as it contained the totals
kable(df7) %>% 
   kable_styling() %>%
   scroll_box(width = "100%", height = "400px")
county election_day early_voting absentee year
Alachua 54204 39271 27393 2012
Baker 4101 3601 1880 2012
Bay 29255 33857 17797 2012
Bradford 5869 3958 1887 2012
Brevard 144657 51449 91170 2012
Broward 341637 245952 171785 2012
Calhoun 2822 2273 1112 2012
Charlotte 24979 26688 33653 2012
Citrus 26650 24686 23031 2012
Clay 36171 41692 18842 2012
Collier 49089 52290 48646 2012
Columbia 9271 13177 4829 2012
DeSoto 4694 3138 2107 2012
Dixie 3776 1262 1959 2012
Duval 157365 174420 82913 2012
Escambia 67630 42861 39647 2012
Flagler 20070 19906 10651 2012
Franklin 2618 1470 1437 2012
Gadsden 8895 9598 4123 2012
Gilchrist 4249 1935 1789 2012
Glades 2541 498 943 2012
Gulf 2591 3046 1517 2012
Hamilton 2659 1442 1355 2012
Hardee 3346 2846 1083 2012
Hendry 4537 3846 1933 2012
Hernando 40030 12752 27806 2012
Highlands 16103 16991 9806 2012
Hillsborough 205283 166990 171666 2012
Holmes 4091 2391 1892 2012
Indian River 23841 28646 19630 2012
Jackson 8913 7777 4371 2012
Jefferson 3904 2481 1454 2012
Lafayette 1892 962 589 2012
Lake 63751 57612 30793 2012
Lee 125346 51932 89638 2012
Leon 72848 45110 30316 2012
Levy 9816 3653 5108 2012
Liberty 1617 1210 496 2012
Madison 3608 3669 1411 2012
Manatee 91799 16357 45880 2012
Marion 80570 43209 39052 2012
Martin 21739 26335 31425 2012
Miami-Dade 406841 235916 247285 2012
Monroe 14491 11106 13778 2012
Nassau 11658 18339 10772 2012
Okaloosa 40005 32847 22496 2012
Okeechobee 5895 3525 2975 2012
Orange 205018 127583 137125 2012
Osceola 41532 41736 26484 2012
Palm Beach 351600 124833 130532 2012
Pasco 94321 60974 59648 2012
Pinellas 172605 39569 250113 2012
Polk 128094 55120 66392 2012
Putnam 14169 10622 6743 2012
Santa Rosa 37931 22639 16648 2012
Sarasota 78328 55473 73876 2012
Seminole 89594 66721 53189 2012
St. Johns 39631 52565 23515 2012
St. Lucie 48960 37353 37041 2012
Sumter 16559 30269 14028 2012
Suwannee 8941 4593 4276 2012
Taylor 4490 2518 2148 2012
Union 2645 1662 1139 2012
Volusia 110545 61097 63821 2012
Wakulla 5940 5431 3321 2012
Walton 13206 10088 5319 2012
Washington 5120 3279 2636 2012

  Okay, everything looks great so far. Let’s proceed to the 2004 to 2010 data.

Pre 2012 Data

Let’s begin by combining df3, df4, df5 and df6 and then inspecting data frame.

df3_6 <- rbind(df3,df4,df5,df6)
kable(df3_6) %>% 
   kable_styling() %>%
   scroll_box(width = "100%", height = "400px")
X1 X2 X3 X4 year
COUNTY ELECTION DAY BALLOTS CAST EARLY VOTING BALLOTS CAST ABSENTEE BALLOTS CAST 2010
2010
ALACHUA 44570 16549 13567 2010
BAKER 4312 2748 1013 2010
BAY 27101 17978 10148 2010
BRADFORD 5298 1746 1145 2010
BREVARD 125316 23387 49967 2010
BROWARD 264978 85768 76023 2010
CALHOUN 2894 1008 469 2010
CHARLOTTE 23093 17514 19749 2010
CITRUS 27442 15141 13133 2010
CLAY 37780 18424 7684 2010
COLLIER 47001 29282 28290 2010
COLUMBIA 9481 6472 3149 2010
DESOTO 4008 1792 1227 2010
DIXIE 3380 910 1117 2010
DUVAL 160334 52279 48260 2010
ESCAMBIA 59410 16991 18941 2010
FLAGLER 18203 10713 4906 2010
FRANKLIN 2204 1035 877 2010
GADSDEN 10044 4597 2136 2010
GILCHRIST 3607 906 952 2010
GLADES 2075 176 555 2010
GULF 2420 1867 807 2010
HAMILTON 2471 701 897 2010
HARDEE 2873 1796 655 2010
HENDRY 4046 1657 940 2010
HERNANDO 36729 10313 12501 2010
HIGHLANDS 15830 10237 4351 2010
HILLSBOROUGH 178323 77351 65167 2010
HOLMES 3624 1322 1260 2010
INDIAN RIVER 24286 12582 11477 2010
JACKSON 8293 4157 2727 2010
JEFFERSON 3773 1673 828 2010
LAFAYETTE 1614 619 324 2010
LAKE 56958 26280 20691 2010
LEE 101711 36876 44754 2010
LEON 63435 18594 18412 2010
LEVY 7901 1671 3343 2010
LIBERTY 1468 517 242 2010
MADISON 3691 1533 1087 2010
MANATEE 73158 9765 24667 2010
MARION 71925 19737 24821 2010
MARTIN 25818 12070 17435 2010
MIAMI-DADE 268211 96884 134598 2010
MONROE 14469 4058 7971 2010
COUNTY ELECTION DAY BALLOTS CAST EARLY VOTING BALLOTS CAST ABSENTEE BALLOTS CAST 2010
2010
NASSAU 13870 8128 5120 2010
OKALOOSA 36520 13482 9058 2010
OKEECHOBEE 4685 2175 1511 2010
ORANGE 141692 51769 82355 2010
OSCEOLA 29050 16394 8985 2010
PALM BEACH 265127 62640 61502 2010
PASCO 85976 31124 21828 2010
PINELLAS 140751 10068 158668 2010
POLK 100149 22754 39937 2010
PUTNAM 14510 3841 3378 2010
SANTA ROSA 29863 11952 7369 2010
SARASOTA 82013 38276 27551 2010
SEMINOLE 72165 33205 29481 2010
ST. JOHNS 39285 24443 11370 2010
ST. LUCIE 41522 15259 20083 2010
SUMTER 14824 21694 6100 2010
SUWANNEE 8339 2684 2603 2010
TAYLOR 4103 1222 1388 2010
UNION 2378 501 796 2010
VOLUSIA 98368 29520 32621 2010
WAKULLA 6053 2997 2056 2010
WALTON 10816 5246 3090 2010
WASHINGTON 4771 1656 1743 2010
2010
TOTALS 3,102,388 1,088,706 1,241,856 2010
COUNTY ELECTION DAY BALLOTS CAST EARLY VOTING BALLOTS CAST ABSENTEE BALLOTS CAST 2008
2008
Alachua 49,077 53,556 22,003 2008
Baker 4,193 5,328 1,562 2008
Bay 33,385 29,404 16,038 2008
Bradford 5,525 4,430 1,740 2008
Brevard 168,547 51,879 63,479 2008
Broward 345,855 252,061 132,268 2008
Calhoun 3,392 2,154 712 2008
Charlotte 23,461 33,580 28,536 2008
Citrus 30,002 28,011 18,400 2008
Clay 40,397 39,141 12,605 2008
Collier 53,022 52,734 36,840 2008
Columbia 9,917 12,945 5,190 2008
DeSoto 4,520 3,761 1,888 2008
Dixie 4,052 1,742 1,544 2008
Duval 151,203 183,410 76,202 2008
Escambia 66,512 49,180 29,991 2008
Flagler 18,671 22,457 7,949 2008
Franklin 2,629 1,943 1,469 2008
Gadsden 8,829 10,559 3,064 2008
Gilchrist 4,242 2,192 1,360 2008
Glades 2,785 595 894 2008
Gulf 2,608 3,321 1,251 2008
Hamilton 2,893 1,542 1,129 2008
Hardee 3,202 3,271 973 2008
Hendry 4,402 4,872 1,652 2008
Hernando 45,231 23,093 19,875 2008
Highlands 16,203 20,806 8,088 2008
Hillsborough 246,303 146,518 114,765 2008
Holmes 4,315 2,587 1,733 2008
Indian River 25,670 27,933 16,792 2008
Jackson 11,271 6,967 3,323 2008
Jefferson 3,800 3,007 1,199 2008
Lafayette 1,801 1,041 530 2008
Lake 60,393 57,528 29,446 2008
Lee 123,872 64,081 82,030 2008
Leon 77,194 42,432 27,810 2008
Levy 10,146 4,254 4,406 2008
Liberty 1,850 1,028 432 2008
Madison 3,980 3,454 1,471 2008
Manatee 96,193 20,177 35,515 2008
COUNTY ELECTION DAY BALLOTS CAST EARLY VOTING BALLOTS CAST ABSENTEE BALLOTS CAST 2008
2008
Marion 81,455 47,683 33,419 2008
Martin 24,844 28,285 25,603 2008
Miami-Dade 372,299 326,358 170,820 2008
Monroe 14,113 13,608 12,043 2008
Nassau 10,837 18,976 8,231 2008
Okaloosa 41,232 33,340 14,841 2008
Okeechobee 5,531 4,782 2,536 2008
Orange 201,361 145,278 119,288 2008
Osceola 43,695 41,305 14,641 2008
Palm Beach 323,264 144,545 124,218 2008
Pasco 109,789 68,858 37,143 2008
Pinellas 237,430 46,385 180,873 2008
Polk 131,739 60,064 52,289 2008
Putnam 17,663 9,952 5,468 2008
Santa Rosa 24,447 26,275 19,626 2008
Sarasota 82,880 70,693 30,732 2008
Seminole 85,801 77,781 11,398 2008
St. Johns 35,905 49,241 53,087 2008
St. Lucie 47,018 43,691 42,092 2008
Sumter 11,678 28,587 8,822 2008
Suwannee 9,079 5,246 3,399 2008
Taylor 4,576 2,835 1,986 2008
Union 2,674 1,386 1,282 2008
Volusia 124,715 68,371 50,724 2008
Wakulla 5,494 6,386 2,519 2008
Walton 12,435 9,649 4,891 2008
Washington 5,687 3,138 2,377 2008
2008
TOTALS 3,839,184 2,661,672 1,850,502 2008
COUNTY ELECTION DAY BALLOTS CAST EARLY VOTING BALLOTS CAST ABSENTEE BALLOTS CAST 2006
2006
Alachua 47,367 13,418 10,456 2006
Baker 4,033 1,608 737 2006
Bay 27,548 13,488 7,607 2006
Bradford 5,214 1,121 908 2006
Brevard 185,074 12,725 28,085 2006
Broward 288,224 78,003 44,872 2006
Calhoun 2,573 522 317 2006
Charlotte 28,900 17,096 9,784 2006
Citrus 32,033 9,968 10,035 2006
Clay 37,442 8,736 5,797 2006
Collier 48,059 20,436 17,847 2006
Columbia 10,128 3,397 2,226 2006
DeSoto 4,371 1,192 1,109 2006
Dixie 3,669 799 704 2006
Duval 173,925 34,193 30,046 2006
Escambia 62,294 8,392 17,209 2006
Flagler 19,060 7,488 3,312 2006
Franklin 2,468 763 764 2006
Gadsden 12,552 2,603 1,640 2006
Gilchrist 3,867 908 673 2006
Glades 2,304 140 547 2006
Gulf 3,068 1,242 627 2006
Hamilton 2,515 382 755 2006
Hardee 2,828 1,218 516 2006
Hendry 3,639 1,120 753 2006
Hernando 40,672 8,206 8,410 2006
Highlands 17,599 8,948 4,043 2006
Hillsborough 184,379 54,051 53,072 2006
Holmes 2,419 1,236 930 2006
Indian River 27,204 10,538 6,181 2006
Jackson 9,707 2,008 1,592 2006
Jefferson 3,905 1,336 837 2006
Lafayette 1,444 403 370 2006
Lake 58,530 16,744 11,845 2006
Lee 108,042 21,914 25,491 2006
Leon 68,544 7,689 14,763 2006
Levy 8,474 778 2,215 2006
Liberty 1,840 377 204 2006
Madison 4,777 1,234 611 2006
Manatee 77,538 7,078 16,044 2006
COUNTY ELECTION DAY BALLOTS CAST EARLY VOTING BALLOTS CAST ABSENTEE BALLOTS CAST 2006
2006
Marion 75,043 10,736 16,046 2006
Martin 28,298 15,946 9,217 2006
Miami-Dade 279,910 66,190 65,955 2006
Monroe 14,983 2,700 7,012 2006
Nassau 14,101 5,527 3,061 2006
Okaloosa 37,869 9,171 9,720 2006
Okeechobee 5,408 1,712 1,261 2006
Orange 149,732 37,052 36,232 2006
Osceola 33,752 5,806 5,429 2006
Palm Beach 255,922 59,720 54,954 2006
Pasco 94,458 19,020 16,814 2006
Pinellas 207,960 35,338 50,620 2006
Polk 100,360 13,753 28,910 2006
Putnam 15,672 1,807 2,495 2006
Santa Rosa 28,637 7,976 6,108 2006
Sarasota 88,927 30,832 22,831 2006
Seminole 73,999 21,450 15,314 2006
St. Johns 41,314 11,805 8,187 2006
St. Lucie 45,553 13,132 12,124 2006
Sumter 12,193 15,266 3,137 2006
Suwannee 8,167 1,101 1,840 2006
Taylor 5,665 833 1,013 2006
Union 2,153 154 627 2006
Volusia 111,432 19,122 23,069 2006
Wakulla 9,674 2,335 1,213 2006
Walton 10,869 3,110 2,437 2006
Washington 4,959 1,239 1,202 2006
2006
TOTALS 3,385,239 796,331 750,762 2006
COUNTY ELECTION DAY BALLOTS CAST EARLY VOTING BALLOTS CAST ABSENTEE BALLOTS CAST 2004
2004
Alachua 76,381 16,389 18,956 2004
Baker 5,879 2,843 1,333 2004
Bay 37,981 18,572 18,799 2004
Bradford 6,871 2,429 1,614 2004
Brevard 185,305 30,564 50,261 2004
Broward 431,488 176,743 100,388 2004
Calhoun 3,718 1,447 850 2004
Charlotte 46,741 13,659 19,803 2004
Citrus 34,177 20,117 15,527 2004
Clay 57,209 6,770 17,786 2004
Collier 57,323 44,155 27,984 2004
Columbia 14,490 7,339 3,262 2004
Miami-Dade 435,887 244,022 100,780 2004
Desoto 5,393 2,310 1,831 2004
Dixie 4,462 1,091 926 2004
Duval 258,746 58,693 64,010 2004
Escambia 95,647 12,807 35,760 2004
Flagler 23,694 8,873 6,008 2004
Franklin 3,111 1,586 1,285 2004
Gadsden 12,024 5,467 3,629 2004
Gilchrist 4,514 1,616 922 2004
Glades 3,101 345 750 2004
Gulf 4,099 1,725 1,489 2004
Hamilton 3,147 937 1,060 2004
Hardee 3,400 2,831 1,073 2004
Hendry 5,817 2,481 1,537 2004
Hernando 50,942 13,942 15,920 2004
Highlands 23,351 11,385 7,168 2004
Hillsborough 313,930 86,642 64,857 2004
Holmes 5,418 1,987 970 2004
Indian River 32,015 17,328 12,210 2004
Jackson 12,655 4,677 2,613 2004
Jefferson 3,979 2,486 1,040 2004
Lafayette 1,935 873 548 2004
Lake 68,121 32,331 24,063 2004
Lee 161,953 32,182 47,669 2004
Leon 95,397 17,974 23,000 2004
Levy 11,748 2,078 2,939 2004
Liberty 1,823 864 366 2004
Madison 4,905 2,446 1,007 2004
Manatee 106,917 10,274 27,052 2004
Monroe 19,706 8,918 11,091 2004
Marion 99,628 14,799 25,938 2004
Martin 43,892 12,243 16,746 2004
COUNTY ELECTION DAY BALLOTS CAST EARLY VOTING BALLOTS CAST ABSENTEE BALLOTS CAST 2004
2004
Nassau 15,839 9,989 7,028 2004
Okaloosa 59,566 10,383 20,073 2004
Okeechobee 6,565 3,507 2,194 2004
Orange 243,536 79,171 68,899 2004
Osceola 52,525 19,120 10,550 2004
Palm Beach 404,666 49,831 92,428 2004
Pasco 127,526 29,584 35,059 2004
Pinellas 299,201 69,902 89,004 2004
Polk 148,150 6,023 59,651 2004
Putnam 22,775 3,775 4,608 2004
Santa Rosa 40,098 15,824 11,541 2004
Sarasota 122,101 33,872 40,838 2004
Seminole 135,161 20,671 30,853 2004
St. Johns 52,151 18,758 15,378 2004
St. Lucie 58,795 22,882 18,896 2004
Sumter 11,382 16,517 4,180 2004
Suwannee 10,666 2,466 2,736 2004
Taylor 4,671 2,421 1,533 2004
Union 2,836 369 1,509 2004
Volusia 144,109 40,166 44,214 2004
Wakulla 5,372 4,776 1,671 2004
Walton 12,996 6,462 4,611 2004
Washington 5,676 2,653 2,173 2004
2004
STATE TOTAL 4,865,283 1,428,362 1,352,447 2004

 

It appears that this data frame is in tidy form but there are still a few issues that need to be fixed. For instance the column names are in the first row and the second row appears to be empty across all columns except for year. Also, what we’re seeing here will be repeated throughout df3_6 each time a new data frame is added as each of df3 to df6 are identically formatted.Thus, we cannot simply fix the first two rows of df3_6 and consider it tidy. Fortunately, formatting the data for 2004 to 2010 will not be as challenging as formatting the data for 2012 onwards. We’ll follow a simplified process from the previous sections to get this data frame in tidy form.

#rename columns
colnames(df3_6) <- c("county","election_day","early_voting", "absentee","year")
df3_6$county <- tolower(df3_6$county)

to_exclude <- which(str_detect(df3_6$county,"county|totals|state total"))
df3_6 <- df3_6[-to_exclude, ]

df3_6 <-  df3_6[order(df3_6$county),] #order putting empty rows at top
df3_6 <- df3_6[-c(1:12),]
df3_6 <- df3_6[order(df3_6$year),] #order by year
kable(df3_6, row.names = FALSE) %>% 
   kable_styling() %>%
   scroll_box(width = "100%", height = "400px")
county election_day early_voting absentee year
alachua 44570 16549 13567 2010
baker 4312 2748 1013 2010
bay 27101 17978 10148 2010
bradford 5298 1746 1145 2010
brevard 125316 23387 49967 2010
broward 264978 85768 76023 2010
calhoun 2894 1008 469 2010
charlotte 23093 17514 19749 2010
citrus 27442 15141 13133 2010
clay 37780 18424 7684 2010
collier 47001 29282 28290 2010
columbia 9481 6472 3149 2010
desoto 4008 1792 1227 2010
dixie 3380 910 1117 2010
duval 160334 52279 48260 2010
escambia 59410 16991 18941 2010
flagler 18203 10713 4906 2010
franklin 2204 1035 877 2010
gadsden 10044 4597 2136 2010
gilchrist 3607 906 952 2010
glades 2075 176 555 2010
gulf 2420 1867 807 2010
hamilton 2471 701 897 2010
hardee 2873 1796 655 2010
hendry 4046 1657 940 2010
hernando 36729 10313 12501 2010
highlands 15830 10237 4351 2010
hillsborough 178323 77351 65167 2010
holmes 3624 1322 1260 2010
indian river 24286 12582 11477 2010
jackson 8293 4157 2727 2010
jefferson 3773 1673 828 2010
lafayette 1614 619 324 2010
lake 56958 26280 20691 2010
lee 101711 36876 44754 2010
leon 63435 18594 18412 2010
levy 7901 1671 3343 2010
liberty 1468 517 242 2010
madison 3691 1533 1087 2010
manatee 73158 9765 24667 2010
marion 71925 19737 24821 2010
martin 25818 12070 17435 2010
miami-dade 268211 96884 134598 2010
monroe 14469 4058 7971 2010
nassau 13870 8128 5120 2010
okaloosa 36520 13482 9058 2010
okeechobee 4685 2175 1511 2010
orange 141692 51769 82355 2010
osceola 29050 16394 8985 2010
palm beach 265127 62640 61502 2010
pasco 85976 31124 21828 2010
pinellas 140751 10068 158668 2010
polk 100149 22754 39937 2010
putnam 14510 3841 3378 2010
santa rosa 29863 11952 7369 2010
sarasota 82013 38276 27551 2010
seminole 72165 33205 29481 2010
st. johns 39285 24443 11370 2010
st. lucie 41522 15259 20083 2010
sumter 14824 21694 6100 2010
suwannee 8339 2684 2603 2010
taylor 4103 1222 1388 2010
union 2378 501 796 2010
volusia 98368 29520 32621 2010
wakulla 6053 2997 2056 2010
walton 10816 5246 3090 2010
washington 4771 1656 1743 2010
alachua 49,077 53,556 22,003 2008
baker 4,193 5,328 1,562 2008
bay 33,385 29,404 16,038 2008
bradford 5,525 4,430 1,740 2008
brevard 168,547 51,879 63,479 2008
broward 345,855 252,061 132,268 2008
calhoun 3,392 2,154 712 2008
charlotte 23,461 33,580 28,536 2008
citrus 30,002 28,011 18,400 2008
clay 40,397 39,141 12,605 2008
collier 53,022 52,734 36,840 2008
columbia 9,917 12,945 5,190 2008
desoto 4,520 3,761 1,888 2008
dixie 4,052 1,742 1,544 2008
duval 151,203 183,410 76,202 2008
escambia 66,512 49,180 29,991 2008
flagler 18,671 22,457 7,949 2008
franklin 2,629 1,943 1,469 2008
gadsden 8,829 10,559 3,064 2008
gilchrist 4,242 2,192 1,360 2008
glades 2,785 595 894 2008
gulf 2,608 3,321 1,251 2008
hamilton 2,893 1,542 1,129 2008
hardee 3,202 3,271 973 2008
hendry 4,402 4,872 1,652 2008
hernando 45,231 23,093 19,875 2008
highlands 16,203 20,806 8,088 2008
hillsborough 246,303 146,518 114,765 2008
holmes 4,315 2,587 1,733 2008
indian river 25,670 27,933 16,792 2008
jackson 11,271 6,967 3,323 2008
jefferson 3,800 3,007 1,199 2008
lafayette 1,801 1,041 530 2008
lake 60,393 57,528 29,446 2008
lee 123,872 64,081 82,030 2008
leon 77,194 42,432 27,810 2008
levy 10,146 4,254 4,406 2008
liberty 1,850 1,028 432 2008
madison 3,980 3,454 1,471 2008
manatee 96,193 20,177 35,515 2008
marion 81,455 47,683 33,419 2008
martin 24,844 28,285 25,603 2008
miami-dade 372,299 326,358 170,820 2008
monroe 14,113 13,608 12,043 2008
nassau 10,837 18,976 8,231 2008
okaloosa 41,232 33,340 14,841 2008
okeechobee 5,531 4,782 2,536 2008
orange 201,361 145,278 119,288 2008
osceola 43,695 41,305 14,641 2008
palm beach 323,264 144,545 124,218 2008
pasco 109,789 68,858 37,143 2008
pinellas 237,430 46,385 180,873 2008
polk 131,739 60,064 52,289 2008
putnam 17,663 9,952 5,468 2008
santa rosa 24,447 26,275 19,626 2008
sarasota 82,880 70,693 30,732 2008
seminole 85,801 77,781 11,398 2008
st. johns 35,905 49,241 53,087 2008
st. lucie 47,018 43,691 42,092 2008
sumter 11,678 28,587 8,822 2008
suwannee 9,079 5,246 3,399 2008
taylor 4,576 2,835 1,986 2008
union 2,674 1,386 1,282 2008
volusia 124,715 68,371 50,724 2008
wakulla 5,494 6,386 2,519 2008
walton 12,435 9,649 4,891 2008
washington 5,687 3,138 2,377 2008
alachua 47,367 13,418 10,456 2006
baker 4,033 1,608 737 2006
bay 27,548 13,488 7,607 2006
bradford 5,214 1,121 908 2006
brevard 185,074 12,725 28,085 2006
broward 288,224 78,003 44,872 2006
calhoun 2,573 522 317 2006
charlotte 28,900 17,096 9,784 2006
citrus 32,033 9,968 10,035 2006
clay 37,442 8,736 5,797 2006
collier 48,059 20,436 17,847 2006
columbia 10,128 3,397 2,226 2006
desoto 4,371 1,192 1,109 2006
dixie 3,669 799 704 2006
duval 173,925 34,193 30,046 2006
escambia 62,294 8,392 17,209 2006
flagler 19,060 7,488 3,312 2006
franklin 2,468 763 764 2006
gadsden 12,552 2,603 1,640 2006
gilchrist 3,867 908 673 2006
glades 2,304 140 547 2006
gulf 3,068 1,242 627 2006
hamilton 2,515 382 755 2006
hardee 2,828 1,218 516 2006
hendry 3,639 1,120 753 2006
hernando 40,672 8,206 8,410 2006
highlands 17,599 8,948 4,043 2006
hillsborough 184,379 54,051 53,072 2006
holmes 2,419 1,236 930 2006
indian river 27,204 10,538 6,181 2006
jackson 9,707 2,008 1,592 2006
jefferson 3,905 1,336 837 2006
lafayette 1,444 403 370 2006
lake 58,530 16,744 11,845 2006
lee 108,042 21,914 25,491 2006
leon 68,544 7,689 14,763 2006
levy 8,474 778 2,215 2006
liberty 1,840 377 204 2006
madison 4,777 1,234 611 2006
manatee 77,538 7,078 16,044 2006
marion 75,043 10,736 16,046 2006
martin 28,298 15,946 9,217 2006
miami-dade 279,910 66,190 65,955 2006
monroe 14,983 2,700 7,012 2006
nassau 14,101 5,527 3,061 2006
okaloosa 37,869 9,171 9,720 2006
okeechobee 5,408 1,712 1,261 2006
orange 149,732 37,052 36,232 2006
osceola 33,752 5,806 5,429 2006
palm beach 255,922 59,720 54,954 2006
pasco 94,458 19,020 16,814 2006
pinellas 207,960 35,338 50,620 2006
polk 100,360 13,753 28,910 2006
putnam 15,672 1,807 2,495 2006
santa rosa 28,637 7,976 6,108 2006
sarasota 88,927 30,832 22,831 2006
seminole 73,999 21,450 15,314 2006
st. johns 41,314 11,805 8,187 2006
st. lucie 45,553 13,132 12,124 2006
sumter 12,193 15,266 3,137 2006
suwannee 8,167 1,101 1,840 2006
taylor 5,665 833 1,013 2006
union 2,153 154 627 2006
volusia 111,432 19,122 23,069 2006
wakulla 9,674 2,335 1,213 2006
walton 10,869 3,110 2,437 2006
washington 4,959 1,239 1,202 2006
alachua 76,381 16,389 18,956 2004
baker 5,879 2,843 1,333 2004
bay 37,981 18,572 18,799 2004
bradford 6,871 2,429 1,614 2004
brevard 185,305 30,564 50,261 2004
broward 431,488 176,743 100,388 2004
calhoun 3,718 1,447 850 2004
charlotte 46,741 13,659 19,803 2004
citrus 34,177 20,117 15,527 2004
clay 57,209 6,770 17,786 2004
collier 57,323 44,155 27,984 2004
columbia 14,490 7,339 3,262 2004
desoto 5,393 2,310 1,831 2004
dixie 4,462 1,091 926 2004
duval 258,746 58,693 64,010 2004
escambia 95,647 12,807 35,760 2004
flagler 23,694 8,873 6,008 2004
franklin 3,111 1,586 1,285 2004
gadsden 12,024 5,467 3,629 2004
gilchrist 4,514 1,616 922 2004
glades 3,101 345 750 2004
gulf 4,099 1,725 1,489 2004
hamilton 3,147 937 1,060 2004
hardee 3,400 2,831 1,073 2004
hendry 5,817 2,481 1,537 2004
hernando 50,942 13,942 15,920 2004
highlands 23,351 11,385 7,168 2004
hillsborough 313,930 86,642 64,857 2004
holmes 5,418 1,987 970 2004
indian river 32,015 17,328 12,210 2004
jackson 12,655 4,677 2,613 2004
jefferson 3,979 2,486 1,040 2004
lafayette 1,935 873 548 2004
lake 68,121 32,331 24,063 2004
lee 161,953 32,182 47,669 2004
leon 95,397 17,974 23,000 2004
levy 11,748 2,078 2,939 2004
liberty 1,823 864 366 2004
madison 4,905 2,446 1,007 2004
manatee 106,917 10,274 27,052 2004
marion 99,628 14,799 25,938 2004
martin 43,892 12,243 16,746 2004
miami-dade 435,887 244,022 100,780 2004
monroe 19,706 8,918 11,091 2004
nassau 15,839 9,989 7,028 2004
okaloosa 59,566 10,383 20,073 2004
okeechobee 6,565 3,507 2,194 2004
orange 243,536 79,171 68,899 2004
osceola 52,525 19,120 10,550 2004
palm beach 404,666 49,831 92,428 2004
pasco 127,526 29,584 35,059 2004
pinellas 299,201 69,902 89,004 2004
polk 148,150 6,023 59,651 2004
putnam 22,775 3,775 4,608 2004
santa rosa 40,098 15,824 11,541 2004
sarasota 122,101 33,872 40,838 2004
seminole 135,161 20,671 30,853 2004
st. johns 52,151 18,758 15,378 2004
st. lucie 58,795 22,882 18,896 2004
sumter 11,382 16,517 4,180 2004
suwannee 10,666 2,466 2,736 2004
taylor 4,671 2,421 1,533 2004
union 2,836 369 1,509 2004
volusia 144,109 40,166 44,214 2004
wakulla 5,372 4,776 1,671 2004
walton 12,996 6,462 4,611 2004
washington 5,676 2,653 2,173 2004

Final Data Frame

Now that we have formatted the data for all the years let’s combine all the data frames and check to ensure that the county names are consistent across years

df <- rbind(df_12,df7,df3_6)
df$county <- tolower(df$county)
unique(df$county)
##  [1] "alachua"      "baker"        "bay"          "bradford"    
##  [5] "brevard"      "broward"      "calhoun"      "charlotte"   
##  [9] "citrus"       "clay"         "collier"      "columbia"    
## [13] "desoto"       "dixie"        "duval"        "escambia"    
## [17] "flagler"      "franklin"     "gadsden"      "gilchrist"   
## [21] "glades"       "gulf"         "hamilton"     "hardee"      
## [25] "hendry"       "hernando"     "highlands"    "hillsborough"
## [29] "holmes"       "indian river" "jackson"      "jefferson"   
## [33] "lafayette"    "lake"         "lee"          "leon"        
## [37] "levy"         "liberty"      "madison"      "manatee"     
## [41] "marion"       "martin"       "miamiâ dade"  "monroe"      
## [45] "nassau"       "okaloosa"     "okeechobee"   "orange"      
## [49] "osceola"      "palm beach"   "pasco"        "pinellas"    
## [53] "polk"         "putnam"       "santa rosa"   "sarasota"    
## [57] "seminole"     "st johns"     "st lucie"     "sumter"      
## [61] "suwannee"     "taylor"       "union"        "volusia"     
## [65] "wakulla"      "walton"       "washington"   "miami-dade"  
## [69] "st. johns"    "st. lucie"

We see that there are two different spellings for St Johns and St Lucie. Let’s correct this and also verify that each county occurs a total 7 times. Once for each year.

df$county <- gsub("st johns","st. johns",df$county)
df$county <- gsub("st lucie", "st. lucie", df$county)

unique(table(df$county))
## [1] 7 5 2

By wrapping table inside of unique we were able to determine that the count for each county has a value of 7 for each of the seven years. Also, let’s confirm that we do have the correct years.

unique(df$year)
## [1] 2016 2014 2012 2010 2008 2006 2004
## Levels: 2016 2014 2012 2010 2008 2006 2004

Great, now let’s glimpse the data again to confirm that everything is in order.

glimpse(df)
## Observations: 469
## Variables: 5
## $ year         <fct> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2...
## $ county       <chr> "alachua", "baker", "bay", "bradford", "brevard",...
## $ election_day <chr> "50491", "3584", "25579", "5095", "109623", "2131...
## $ early_voting <chr> "51097", "7180", "43963", "4675", "118427", "4265...
## $ absentee     <chr> "29040", "2003", "18952", "2450", "92708", "20290...

Oops! It appears that the numeric columns are treated as characters. Let’s check the last few rows of the data to see if we can detect the problem.

tail(df)
##     year     county election_day early_voting absentee
## 285 2004     taylor        4,671        2,421    1,533
## 286 2004      union        2,836          369    1,509
## 287 2004    volusia      144,109       40,166   44,214
## 288 2004    wakulla        5,372        4,776    1,671
## 289 2004     walton       12,996        6,462    4,611
## 290 2004 washington        5,676        2,653    2,173

It appears that the comma is embedded in some of the values, for the earlier years, causing numeric values to be interpreted as characters. Let’s remove the comma and check the data once more.

df[,c(3:5)] <- apply(df[,c(3:5)], 2, function(x) as.numeric(gsub(",","",as.character(x))))

kable(df,row.names = FALSE) %>% 
   kable_styling() %>%
   scroll_box(width = "100%", height = "400px")
year county election_day early_voting absentee
2016 alachua 50491 51097 29040
2016 baker 3584 7180 2003
2016 bay 25579 43963 18952
2016 bradford 5095 4675 2450
2016 brevard 109623 118427 92708
2016 broward 213112 426592 202904
2016 calhoun 2448 2692 1053
2016 charlotte 23477 39644 35019
2016 citrus 26556 29323 25105
2016 clay 33064 54533 20165
2016 collier 42681 74932 57159
2016 columbia 9132 14044 5884
2016 desoto 4274 4452 2216
2016 dixie 3011 2319 1935
2016 duval 136287 227475 76053
2016 escambia 59696 56213 39791
2016 flagler 15191 29696 13537
2016 franklin 2264 2314 1533
2016 gadsden 8359 9627 4317
2016 gilchrist 4233 2412 1865
2016 glades 2720 863 831
2016 gulf 2293 3612 1456
2016 hamilton 2430 1702 1392
2016 hardee 3241 3618 829
2016 hendry 3597 5810 1861
2016 hernando 35004 25911 34558
2016 highlands 16149 18996 11197
2016 hillsborough 173637 242184 191201
2016 holmes 3669 2901 2044
2016 indian river 25359 33136 23196
2016 jackson 7970 9116 4179
2016 jefferson 3553 2883 1317
2016 lafayette 1653 1147 624
2016 lake 65105 75197 32672
2016 lee 74735 100893 158302
2016 leon 55731 67518 31792
2016 levy 9287 5108 5182
2016 liberty 1497 1347 508
2016 madison 2792 4554 1261
2016 manatee 64556 48604 68494
2016 marion 72776 66191 38146
2016 martin 23118 34038 30375
2016 miamiâ dade 217803 476001 311597
2016 monroe 15597 13325 14398
2016 nassau 13480 23515 10294
2016 okaloosa 34348 45186 23318
2016 okeechobee 5393 5879 2596
2016 orange 141015 255113 169031
2016 osceola 34044 63254 45228
2016 palm beach 278030 241376 152226
2016 pasco 100110 77027 68565
2016 pinellas 168563 78177 255374
2016 polk 116350 92598 80091
2016 putnam 12996 13856 6621
2016 santa rosa 32823 37197 19143
2016 sarasota 75810 79817 77811
2016 seminole 56794 109262 63291
2016 st. johns 39611 71617 27329
2016 st. lucie 44579 57289 41389
2016 sumter 12517 45545 19812
2016 suwannee 8006 6490 4374
2016 taylor 4011 3099 2257
2016 union 2412 2317 1025
2016 volusia 91647 97936 74939
2016 wakulla 5548 6824 3152
2016 walton 13519 13439 7141
2016 washington 5050 3675 2509
2014 alachua 41973 17983 19091
2014 baker 4350 2621 1653
2014 bay 26898 17278 14225
2014 bradford 4997 2062 1640
2014 brevard 118826 33327 73765
2014 broward 223095 129248 123820
2014 calhoun 2297 1258 701
2014 charlotte 23534 16463 28194
2014 citrus 25484 13800 19777
2014 clay 36089 19082 13699
2014 collier 42921 31768 40603
2014 columbia 8658 6161 3828
2014 desoto 3844 2158 1622
2014 dixie 2827 1196 1489
2014 duval 143441 73604 55759
2014 escambia 53510 18421 27453
2014 flagler 15407 13984 9091
2014 franklin 2253 998 1128
2014 gadsden 9099 5162 3533
2014 gilchrist 3624 1035 1405
2014 glades 2057 245 775
2014 gulf 2288 1830 1090
2014 hamilton 2057 662 1049
2014 hardee 1855 1842 789
2014 hendry 3100 2281 1400
2014 hernando 31957 9781 23154
2014 highlands 15466 9126 7900
2014 hillsborough 156529 88379 131659
2014 holmes 3039 1256 1512
2014 indian river 24047 14749 14783
2014 jackson 7349 4354 3644
2014 jefferson 3550 1587 1192
2014 lafayette 1500 693 480
2014 lake 60379 28798 24980
2014 lee 70887 38077 104345
2014 leon 58289 27885 22939
2014 levy 7222 1993 4315
2014 liberty 1514 815 390
2014 madison 3250 2122 1040
2014 manatee 62249 13898 45344
2014 marion 67987 20715 33109
2014 martin 23313 14615 24621
2014 miamiâ dade 227275 116936 189324
2014 monroe 13633 5665 9818
2014 nassau 14388 8679 7888
2014 okaloosa 33757 14877 13409
2014 okeechobee 4665 2228 2147
2014 orange 130688 71962 110637
2014 osceola 31266 15866 22544
2014 palm beach 241002 93593 86952
2014 pasco 83556 32455 46912
2014 pinellas 133828 18987 204706
2014 polk 104849 29473 60309
2014 putnam 12596 5422 5041
2014 santa rosa 30964 11982 10785
2014 sarasota 77364 30903 55160
2014 seminole 66202 36368 45222
2014 st. johns 44479 26845 16572
2014 st. lucie 41534 19632 29399
2014 sumter 16260 25471 13941
2014 suwannee 6947 2424 3498
2014 taylor 3738 1206 1826
2014 union 2396 1713 1055
2014 volusia 88743 33259 56719
2014 wakulla 5911 3176 2599
2014 walton 11495 4763 4311
2014 washington 4454 1846 2065
2012 alachua 54204 39271 27393
2012 baker 4101 3601 1880
2012 bay 29255 33857 17797
2012 bradford 5869 3958 1887
2012 brevard 144657 51449 91170
2012 broward 341637 245952 171785
2012 calhoun 2822 2273 1112
2012 charlotte 24979 26688 33653
2012 citrus 26650 24686 23031
2012 clay 36171 41692 18842
2012 collier 49089 52290 48646
2012 columbia 9271 13177 4829
2012 desoto 4694 3138 2107
2012 dixie 3776 1262 1959
2012 duval 157365 174420 82913
2012 escambia 67630 42861 39647
2012 flagler 20070 19906 10651
2012 franklin 2618 1470 1437
2012 gadsden 8895 9598 4123
2012 gilchrist 4249 1935 1789
2012 glades 2541 498 943
2012 gulf 2591 3046 1517
2012 hamilton 2659 1442 1355
2012 hardee 3346 2846 1083
2012 hendry 4537 3846 1933
2012 hernando 40030 12752 27806
2012 highlands 16103 16991 9806
2012 hillsborough 205283 166990 171666
2012 holmes 4091 2391 1892
2012 indian river 23841 28646 19630
2012 jackson 8913 7777 4371
2012 jefferson 3904 2481 1454
2012 lafayette 1892 962 589
2012 lake 63751 57612 30793
2012 lee 125346 51932 89638
2012 leon 72848 45110 30316
2012 levy 9816 3653 5108
2012 liberty 1617 1210 496
2012 madison 3608 3669 1411
2012 manatee 91799 16357 45880
2012 marion 80570 43209 39052
2012 martin 21739 26335 31425
2012 miami-dade 406841 235916 247285
2012 monroe 14491 11106 13778
2012 nassau 11658 18339 10772
2012 okaloosa 40005 32847 22496
2012 okeechobee 5895 3525 2975
2012 orange 205018 127583 137125
2012 osceola 41532 41736 26484
2012 palm beach 351600 124833 130532
2012 pasco 94321 60974 59648
2012 pinellas 172605 39569 250113
2012 polk 128094 55120 66392
2012 putnam 14169 10622 6743
2012 santa rosa 37931 22639 16648
2012 sarasota 78328 55473 73876
2012 seminole 89594 66721 53189
2012 st. johns 39631 52565 23515
2012 st. lucie 48960 37353 37041
2012 sumter 16559 30269 14028
2012 suwannee 8941 4593 4276
2012 taylor 4490 2518 2148
2012 union 2645 1662 1139
2012 volusia 110545 61097 63821
2012 wakulla 5940 5431 3321
2012 walton 13206 10088 5319
2012 washington 5120 3279 2636
2010 alachua 44570 16549 13567
2010 baker 4312 2748 1013
2010 bay 27101 17978 10148
2010 bradford 5298 1746 1145
2010 brevard 125316 23387 49967
2010 broward 264978 85768 76023
2010 calhoun 2894 1008 469
2010 charlotte 23093 17514 19749
2010 citrus 27442 15141 13133
2010 clay 37780 18424 7684
2010 collier 47001 29282 28290
2010 columbia 9481 6472 3149
2010 desoto 4008 1792 1227
2010 dixie 3380 910 1117
2010 duval 160334 52279 48260
2010 escambia 59410 16991 18941
2010 flagler 18203 10713 4906
2010 franklin 2204 1035 877
2010 gadsden 10044 4597 2136
2010 gilchrist 3607 906 952
2010 glades 2075 176 555
2010 gulf 2420 1867 807
2010 hamilton 2471 701 897
2010 hardee 2873 1796 655
2010 hendry 4046 1657 940
2010 hernando 36729 10313 12501
2010 highlands 15830 10237 4351
2010 hillsborough 178323 77351 65167
2010 holmes 3624 1322 1260
2010 indian river 24286 12582 11477
2010 jackson 8293 4157 2727
2010 jefferson 3773 1673 828
2010 lafayette 1614 619 324
2010 lake 56958 26280 20691
2010 lee 101711 36876 44754
2010 leon 63435 18594 18412
2010 levy 7901 1671 3343
2010 liberty 1468 517 242
2010 madison 3691 1533 1087
2010 manatee 73158 9765 24667
2010 marion 71925 19737 24821
2010 martin 25818 12070 17435
2010 miami-dade 268211 96884 134598
2010 monroe 14469 4058 7971
2010 nassau 13870 8128 5120
2010 okaloosa 36520 13482 9058
2010 okeechobee 4685 2175 1511
2010 orange 141692 51769 82355
2010 osceola 29050 16394 8985
2010 palm beach 265127 62640 61502
2010 pasco 85976 31124 21828
2010 pinellas 140751 10068 158668
2010 polk 100149 22754 39937
2010 putnam 14510 3841 3378
2010 santa rosa 29863 11952 7369
2010 sarasota 82013 38276 27551
2010 seminole 72165 33205 29481
2010 st. johns 39285 24443 11370
2010 st. lucie 41522 15259 20083
2010 sumter 14824 21694 6100
2010 suwannee 8339 2684 2603
2010 taylor 4103 1222 1388
2010 union 2378 501 796
2010 volusia 98368 29520 32621
2010 wakulla 6053 2997 2056
2010 walton 10816 5246 3090
2010 washington 4771 1656 1743
2008 alachua 49077 53556 22003
2008 baker 4193 5328 1562
2008 bay 33385 29404 16038
2008 bradford 5525 4430 1740
2008 brevard 168547 51879 63479
2008 broward 345855 252061 132268
2008 calhoun 3392 2154 712
2008 charlotte 23461 33580 28536
2008 citrus 30002 28011 18400
2008 clay 40397 39141 12605
2008 collier 53022 52734 36840
2008 columbia 9917 12945 5190
2008 desoto 4520 3761 1888
2008 dixie 4052 1742 1544
2008 duval 151203 183410 76202
2008 escambia 66512 49180 29991
2008 flagler 18671 22457 7949
2008 franklin 2629 1943 1469
2008 gadsden 8829 10559 3064
2008 gilchrist 4242 2192 1360
2008 glades 2785 595 894
2008 gulf 2608 3321 1251
2008 hamilton 2893 1542 1129
2008 hardee 3202 3271 973
2008 hendry 4402 4872 1652
2008 hernando 45231 23093 19875
2008 highlands 16203 20806 8088
2008 hillsborough 246303 146518 114765
2008 holmes 4315 2587 1733
2008 indian river 25670 27933 16792
2008 jackson 11271 6967 3323
2008 jefferson 3800 3007 1199
2008 lafayette 1801 1041 530
2008 lake 60393 57528 29446
2008 lee 123872 64081 82030
2008 leon 77194 42432 27810
2008 levy 10146 4254 4406
2008 liberty 1850 1028 432
2008 madison 3980 3454 1471
2008 manatee 96193 20177 35515
2008 marion 81455 47683 33419
2008 martin 24844 28285 25603
2008 miami-dade 372299 326358 170820
2008 monroe 14113 13608 12043
2008 nassau 10837 18976 8231
2008 okaloosa 41232 33340 14841
2008 okeechobee 5531 4782 2536
2008 orange 201361 145278 119288
2008 osceola 43695 41305 14641
2008 palm beach 323264 144545 124218
2008 pasco 109789 68858 37143
2008 pinellas 237430 46385 180873
2008 polk 131739 60064 52289
2008 putnam 17663 9952 5468
2008 santa rosa 24447 26275 19626
2008 sarasota 82880 70693 30732
2008 seminole 85801 77781 11398
2008 st. johns 35905 49241 53087
2008 st. lucie 47018 43691 42092
2008 sumter 11678 28587 8822
2008 suwannee 9079 5246 3399
2008 taylor 4576 2835 1986
2008 union 2674 1386 1282
2008 volusia 124715 68371 50724
2008 wakulla 5494 6386 2519
2008 walton 12435 9649 4891
2008 washington 5687 3138 2377
2006 alachua 47367 13418 10456
2006 baker 4033 1608 737
2006 bay 27548 13488 7607
2006 bradford 5214 1121 908
2006 brevard 185074 12725 28085
2006 broward 288224 78003 44872
2006 calhoun 2573 522 317
2006 charlotte 28900 17096 9784
2006 citrus 32033 9968 10035
2006 clay 37442 8736 5797
2006 collier 48059 20436 17847
2006 columbia 10128 3397 2226
2006 desoto 4371 1192 1109
2006 dixie 3669 799 704
2006 duval 173925 34193 30046
2006 escambia 62294 8392 17209
2006 flagler 19060 7488 3312
2006 franklin 2468 763 764
2006 gadsden 12552 2603 1640
2006 gilchrist 3867 908 673
2006 glades 2304 140 547
2006 gulf 3068 1242 627
2006 hamilton 2515 382 755
2006 hardee 2828 1218 516
2006 hendry 3639 1120 753
2006 hernando 40672 8206 8410
2006 highlands 17599 8948 4043
2006 hillsborough 184379 54051 53072
2006 holmes 2419 1236 930
2006 indian river 27204 10538 6181
2006 jackson 9707 2008 1592
2006 jefferson 3905 1336 837
2006 lafayette 1444 403 370
2006 lake 58530 16744 11845
2006 lee 108042 21914 25491
2006 leon 68544 7689 14763
2006 levy 8474 778 2215
2006 liberty 1840 377 204
2006 madison 4777 1234 611
2006 manatee 77538 7078 16044
2006 marion 75043 10736 16046
2006 martin 28298 15946 9217
2006 miami-dade 279910 66190 65955
2006 monroe 14983 2700 7012
2006 nassau 14101 5527 3061
2006 okaloosa 37869 9171 9720
2006 okeechobee 5408 1712 1261
2006 orange 149732 37052 36232
2006 osceola 33752 5806 5429
2006 palm beach 255922 59720 54954
2006 pasco 94458 19020 16814
2006 pinellas 207960 35338 50620
2006 polk 100360 13753 28910
2006 putnam 15672 1807 2495
2006 santa rosa 28637 7976 6108
2006 sarasota 88927 30832 22831
2006 seminole 73999 21450 15314
2006 st. johns 41314 11805 8187
2006 st. lucie 45553 13132 12124
2006 sumter 12193 15266 3137
2006 suwannee 8167 1101 1840
2006 taylor 5665 833 1013
2006 union 2153 154 627
2006 volusia 111432 19122 23069
2006 wakulla 9674 2335 1213
2006 walton 10869 3110 2437
2006 washington 4959 1239 1202
2004 alachua 76381 16389 18956
2004 baker 5879 2843 1333
2004 bay 37981 18572 18799
2004 bradford 6871 2429 1614
2004 brevard 185305 30564 50261
2004 broward 431488 176743 100388
2004 calhoun 3718 1447 850
2004 charlotte 46741 13659 19803
2004 citrus 34177 20117 15527
2004 clay 57209 6770 17786
2004 collier 57323 44155 27984
2004 columbia 14490 7339 3262
2004 desoto 5393 2310 1831
2004 dixie 4462 1091 926
2004 duval 258746 58693 64010
2004 escambia 95647 12807 35760
2004 flagler 23694 8873 6008
2004 franklin 3111 1586 1285
2004 gadsden 12024 5467 3629
2004 gilchrist 4514 1616 922
2004 glades 3101 345 750
2004 gulf 4099 1725 1489
2004 hamilton 3147 937 1060
2004 hardee 3400 2831 1073
2004 hendry 5817 2481 1537
2004 hernando 50942 13942 15920
2004 highlands 23351 11385 7168
2004 hillsborough 313930 86642 64857
2004 holmes 5418 1987 970
2004 indian river 32015 17328 12210
2004 jackson 12655 4677 2613
2004 jefferson 3979 2486 1040
2004 lafayette 1935 873 548
2004 lake 68121 32331 24063
2004 lee 161953 32182 47669
2004 leon 95397 17974 23000
2004 levy 11748 2078 2939
2004 liberty 1823 864 366
2004 madison 4905 2446 1007
2004 manatee 106917 10274 27052
2004 marion 99628 14799 25938
2004 martin 43892 12243 16746
2004 miami-dade 435887 244022 100780
2004 monroe 19706 8918 11091
2004 nassau 15839 9989 7028
2004 okaloosa 59566 10383 20073
2004 okeechobee 6565 3507 2194
2004 orange 243536 79171 68899
2004 osceola 52525 19120 10550
2004 palm beach 404666 49831 92428
2004 pasco 127526 29584 35059
2004 pinellas 299201 69902 89004
2004 polk 148150 6023 59651
2004 putnam 22775 3775 4608
2004 santa rosa 40098 15824 11541
2004 sarasota 122101 33872 40838
2004 seminole 135161 20671 30853
2004 st. johns 52151 18758 15378
2004 st. lucie 58795 22882 18896
2004 sumter 11382 16517 4180
2004 suwannee 10666 2466 2736
2004 taylor 4671 2421 1533
2004 union 2836 369 1509
2004 volusia 144109 40166 44214
2004 wakulla 5372 4776 1671
2004 walton 12996 6462 4611
2004 washington 5676 2653 2173

 

Finally, the data is in tidy format! Let’s pause for a round of applause.  

Data Visualization

Let’s visualize the data to reveal any trends.

data <- gather(df, "vote_type", count, 3:5, -c(1,2))


to_plot <- data %>% 
  group_by(year,vote_type) %>% 
  summarise(total = sum(count)) %>% 
  mutate(percentage = total/sum(total))

to_plot$year <- factor(to_plot$year, levels = c("2004","2006","2008","2010","2012","2014","2016"))
ggplot(to_plot, aes(x = year, y = total, fill = vote_type))+
  geom_col()+
  labs(x = 'Year',y = 'Total Voters', title = 'Florida\'s\ Yearly Voter Turnout\n (2004 - 2016)',
       fill = 'Vote Method')+
  scale_y_continuous(labels = comma)+
  geom_text(aes(label = paste0(round(percentage*100),"%")), position = position_stack(vjust = 0.5))+
  scale_fill_brewer(palette = "Dark2")+
  theme_minimal()+
  theme(plot.title = element_text(hjust = 0.5))

Based on the above chart we observe a few things:

Conclusion

At times the data needed for analysis is not readily available. As data analysts, data scientists, and sometimes statisticians it is part of our jobs to be equipped with the skills needed to extract and tidy data in both structured and unstructured format. This post provided a brief introduction on how we can extract structured data from web-based files, tidy and then visualize the data to gain insights.


The R Markdown script used to create this file can be found here. Additionally, an interactive chart was created to allow the user to view the turnout within a specific county and can be found here.